184 research outputs found

    Mapping the complexity of transcription control in higher eukaryotes

    Get PDF
    Recent large analyses suggest the importance of combinatorial regulation by broadly expressed transcription factors rather than expression domains characterized by highly specific factors

    Spatial preferences of microRNA targets in 3' untranslated regions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>MicroRNAs are an important class of regulatory RNAs which repress animal genes by preferentially interacting with complementary sequence motifs in the 3' untranslated region (UTR) of target mRNAs. Computational methods have been developed which can successfully predict which microRNA may target which mRNA on a genome-wide scale.</p> <p>Results</p> <p>We address how predicted target sites may be affected by alternative polyadenylation events changing the 3'UTR sequence. We find that two thirds of targeted genes have alternative 3'UTRs, with 40% of predicted target sites located in alternative UTR segments. We propose three classes based on whether the target sites fall within constitutive and/or alternative UTR segments, and examine the spatial distribution of predicted targets in alternative UTRs. In particular, there is a strong preference for targets to be located in close vicinity of the stop codon and the polyadenylation sites.</p> <p>Conclusion</p> <p>The transcript diversity seen in non-coding regions, as well as the relative location of miRNA target sites defined by it, has a potentially large impact on gene regulation by miRNAs and should be taken into account when defining, predicting or validating miRNA targets.</p

    Phylogenetic simulation of promoter evolution: estimation and modeling of binding site turnover events and assessment of their impact on alignment tools

    Get PDF
    Phylogenetic simulation of promoter evolution were used to analyze functional site turnover in regulatory sequences

    Deep learning for prediction of population health costs

    Full text link
    Accurate prediction of healthcare costs is important for optimally managing health costs. However, methods leveraging the medical richness from data such as health insurance claims or electronic health records are missing. Here, we developed a deep neural network to predict future cost from health insurance claims records. We applied the deep network and a ridge regression model to a sample of 1.4 million German insurants to predict total one-year health care costs. Both methods were compared to Morbi-RSA models with various performance measures and were also used to predict patients with a change in costs and to identify relevant codes for this prediction. We showed that the neural network outperformed the ridge regression as well as all Morbi-RSA models for cost prediction. Further, the neural network was superior to ridge regression in predicting patients with cost change and identified more specific codes. In summary, we showed that our deep neural network can leverage the full complexity of the patient records and outperforms standard approaches. We suggest that the better performance is due to the ability to incorporate complex interactions in the model and that the model might also be used for predicting other health phenotypes

    ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data

    Get PDF
    RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM’s model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image

    Orthologous Transcription Factors in Bacteria Have Different Functions and Regulate Different Genes

    Get PDF
    Transcription factors (TFs) form large paralogous gene families and have complex evolutionary histories. Here, we ask whether putative orthologs of TFs, from bidirectional best BLAST hits (BBHs), are evolutionary orthologs with conserved functions. We show that BBHs of TFs from distantly related bacteria are usually not evolutionary orthologs. Furthermore, the false orthologs usually respond to different signals and regulate distinct pathways, while the few BBHs that are evolutionary orthologs do have conserved functions. To test the conservation of regulatory interactions, we analyze expression patterns. We find that regulatory relationships between TFs and their regulated genes are usually not conserved for BBHs in Escherichia coli K12 and Bacillus subtilis. Even in the much more closely related bacteria Vibrio cholerae and Shewanella oneidensis MR-1, predicting regulation from E. coli BBHs has high error rates. Using gene–regulon correlations, we identify genes whose expression pattern differs between E. coli and S. oneidensis. Using literature searches and sequence analysis, we show that these changes in expression patterns reflect changes in gene regulation, even for evolutionary orthologs. We conclude that the evolution of bacterial regulation should be analyzed with phylogenetic trees, rather than BBHs, and that bacterial regulatory networks evolve more rapidly than previously thought

    Improving the Caenorhabditis elegans Genome Annotation Using Machine Learning

    Get PDF
    For modern biology, precise genome annotations are of prime importance, as they allow the accurate definition of genic regions. We employ state-of-the-art machine learning methods to assay and improve the accuracy of the genome annotation of the nematode Caenorhabditis elegans. The proposed machine learning system is trained to recognize exons and introns on the unspliced mRNA, utilizing recent advances in support vector machines and label sequence learning. In 87% (coding and untranslated regions) and 95% (coding regions only) of all genes tested in several out-of-sample evaluations, our method correctly identified all exons and introns. Notably, only 37% and 50%, respectively, of the presently unconfirmed genes in the C. elegans genome annotation agree with our predictions, thus we hypothesize that a sizable fraction of those genes are not correctly annotated. A retrospective evaluation of the Wormbase WS120 annotation [1] of C. elegans reveals that splice form predictions on unconfirmed genes in WS120 are inaccurate in about 18% of the considered cases, while our predictions deviate from the truth only in 10%–13%. We experimentally analyzed 20 controversial genes on which our system and the annotation disagree, confirming the superiority of our predictions. While our method correctly predicted 75% of those cases, the standard annotation was never completely correct. The accuracy of our system is further corroborated by a comparison with two other recently proposed systems that can be used for splice form prediction: SNAP and ExonHunter. We conclude that the genome annotation of C. elegans and other organisms can be greatly enhanced using modern machine learning technology

    Evidence-ranked motif identification

    Get PDF
    A new computational method for the identification of regulatory motifs from large genomic datasets is presented her
    corecore